Use of linguistic features for improving English-Persian SMT

نویسندگان

Zakieh Shakeri

Neda Noormohammadi

Shahram Khadivi

Noushin Riahi

چکیده

In this paper, we investigate the effects of using linguistic information for improvement of statistical machine translation for English-Persian language pair. We choose POS tags as helping linguistic feature. A monolingual Persian corpus with POS tags is prepared and variety of tags is chosen to be small. Using the POS tagger trained on this corpus, we apply a factored translation model. We also create manual reordering rules that try to harmonize the order of words in Persian and English languages. In the experiments, factored translation model shows better performance compared to unfactored model. Also using the manual rules,which just contain few local reordering rules, increases the BLEU score compared to monotone distortion model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora

Ebrahim Ansari ([email protected]) et al. 2017. Using english as pivot to extract persian-italian parallel sentences from non-parallel corpora. In " Applications of Comparable Corpora " edited book Berlin Linguistic Press (ed.). The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource l...

متن کامل

English and Persian Sport Newspaper Headlines: A comparative study of linguistic means

Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...

متن کامل

A Persian-English Cross-Linguistic Dataset for Research on the Visual Processing of Cognates and Noncognates

Finding out which lexico-semantic features of cognates are critical in cross-language studies and comparing these features with noncognates helps researchers to decide which features to control in studies with cognates. Normative databases provide necessary information for this purpose. Such resources are lacking in the Persian language. We created a dataset and determined norms for the essenti...

متن کامل

English and Persian Sport Newspaper Headlines: A comparative study of linguistic means

متن کامل

A Comparative Study of English and Persian Advertising Slogans: Linguistic Means through the Sands of Time

This study was a contrastive analysis of the evolution of English and Persian advertising slogans to investigate their similarities/differences in using rhetorical figures, and the evolution in the use of these figures in the slogans of each language. Thus, 800 Persian and English slogans from the last four decades were collected. Lapsanka's framework (2006) including different aspects with som...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Use of linguistic features for improving English-Persian SMT

نویسندگان

چکیده

منابع مشابه

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora

English and Persian Sport Newspaper Headlines: A comparative study of linguistic means

A Persian-English Cross-Linguistic Dataset for Research on the Visual Processing of Cognates and Noncognates

English and Persian Sport Newspaper Headlines: A comparative study of linguistic means

A Comparative Study of English and Persian Advertising Slogans: Linguistic Means through the Sands of Time

عنوان ژورنال:

اشتراک گذاری